Add warning message for beta and gamma parameters#31654
Add warning message for beta and gamma parameters#31654amyeroberts merged 5 commits intohuggingface:mainfrom
Conversation
src/transformers/modeling_utils.py
Outdated
| if "beta" in loaded_keys or "gamma" in loaded_keys: | ||
| logger.warning( | ||
| f"Parameter names `gamma` or `beta` for {cls.__name__} will be renamed within the model. " | ||
| f"Please use different names to suppress this warning." | ||
| ) |
There was a problem hiding this comment.
I don't this is quite right, this assumes the weight is called "beta" in the state dict, but it could be called "layer.beta"
amyeroberts
left a comment
There was a problem hiding this comment.
Hi @OmarManzoor,
Thanks for addressing this! We want to make sure we catch any place where the renaming happens, so any place where if gamma in key and if beta in key are True (so key can be a longer string that contains beta or gamma). As you've added, this would be in _load_pretrained_model but also in _load_state_dict_into_model
Hi @amyeroberts |
amyeroberts
left a comment
There was a problem hiding this comment.
Hi @OmarManzoor, thanks for iterating on this!
Given the diff, I'm slightly confused, were there no warnings being triggered before? It seems like they were from the tests and logging messages
There was a problem hiding this comment.
More importantly, we should check that the parameter is renamed as well
There was a problem hiding this comment.
I tried this out and it seems that the parameter is not renamed at all. Basically when we load the model using from_pretrained it seems that the parameter is still present with the name gamma_param.
There was a problem hiding this comment.
It shouldn't rename the value in the model, but will rename the value in the state_dict, I believe. Could you dive into the loading logic and verify what's happening?
There was a problem hiding this comment.
I tried updating the tests. Could you kindly have a look?
I basically removed the warning code that I added in the post init method. Should that be kept? |
|
@OmarManzoor Ah, OK. I think the diff was rendering funny on github. Should be OK. |
amyeroberts
left a comment
There was a problem hiding this comment.
Looks great - thanks for adding and iterating on this!
Thank you. |
|
Why have you added warnings only for the initialization process and not for renaming during loading as well? The model I'm using is timm's convnext (which is even the companion framework to transformers), which would have the parameter gamma. When loading he just tells me that I didn't successfully load the gamma function without telling me why, and I think the user should be informed when renaming the state_dict, otherwise it will cause unnecessary confusion. |
What does this PR do?
This adds a warning message to notify about the renaming of
gammaandbetaparameters during initialisation and also during loading.Fixes #29554
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
@amyeroberts